Detecting anomalies in a distributed application

ABSTRACT

Anomalies are detected in a distributed application that runs on a plurality of nodes to execute at least first and second workloads. The method of detecting anomalies includes collecting first network traffic data of the first workload and second network traffic data of the second workload during a first period of execution of the first and second workloads, collecting third network traffic data of the first workload and fourth network traffic data of the second workload during a second period of execution of the first and second workloads, and detecting an anomaly in the distributed application based on a comparison of the third network traffic data against the first network traffic data or a comparison of the fourth network traffic data against the second network traffic data. Anomalies may also be detected by comparing network traffic data of two groups of containers executing the same workload.

BACKGROUND

Techniques have been developed to detect anomalies in monolithicapplications as a way to alert the user that malware might be present.Such techniques are not readily transportable to distributedapplications, such as those deployed onto a Kubernetes® platform, whichrun as ephemeral containers within pods on different nodes of theKubernetes platform. Detection of anomalies in distributed applicationsis quite different from detecting anomalies in monolithic applicationsfor several reasons.

The distributed application is composed of several parts, each designedto execute a particular workload and thus behave differently from otherparts. There is a need to understand the normal behavior of each part inorder to detect whether a certain behavior is anomalous or not. Bycontrast, the monolithic application is a single unit of software andanomaly is detected when the software as a whole deviates from itsexpected behavior.

In distributed applications deployed onto a Kubernetes platform, data isdistributed between pods and there is data traffic in and out of eachpod. By contrast, data is maintained in a single place in a monolithicapplication and the only data traffic is data traffic in and out of themonolithic application. As a result, anomalous behavior in distributedapplications may go undetected using anomalous detection techniquesdeveloped for monolithic applications which evaluate only the datatraffic in and out of the applications for anomalous behavior.

Further, in distributed applications deployed onto a Kubernetesplatform, pods in some sets, e.g., replica sets, are expected to behavesimilarly on average. In such situations, one of the pods in a set couldbehave anomalously or all of the pods in the set could behaveanomalously. Anomalous detection techniques developed for monolithicapplications may be unable to detect anomalous behavior when one of thepods in the set behave differently from other pods in the set.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a client cluster running a distributed applicationand an anomaly detection system according to embodiments.

FIG. 2 also illustrates a client cluster running a distributedapplication with a modified configuration and the anomaly detectionsystem according to embodiments.

FIG. 3 is a flow diagram that illustrates the steps of storing activitydata for compiling statistics for detecting anomalies, according toembodiments.

FIG. 4 is a flow diagram that illustrates the steps of detectinganomalies in a distributed application using the compiled statistics,according to embodiments.

DETAILED DESCRIPTION

One or more embodiments provide techniques for detecting anomalies in adistributed application. In particular, techniques described herein areable to detect anomalous behavior in a distributed application deployedonto a Kubernetes platform. The anomalous behavior may be detected in aset of pods executing one of the workloads of the distributedapplication or it may be detected in one of the pods in the set, basedon a comparison with their respective past normal behavior. In addition,the anomaly may be detected in a pod based on a comparison with thebehavior of other pods in the set.

FIG. 1 illustrates a client cluster 100 running a distributedapplication and an anomaly detection system 200 according toembodiments. In the embodiments illustrated herein, the distributedapplication is deployed onto a Kubernetes platform. However, inalternative embodiments, the distributed application may be deployedonto other types of computing environments.

Client cluster 100 is a Kubernetes cluster and to simplify thedescription, only pods 151 executing activity monitor 150, pods 161-163executing Workload A, and pods 171-172 executing Workload B, all runningin the Kubernetes cluster are shown. As is known in the art, pods run onnodes of the Kubernetes cluster, and one or more containers run insidepods. In addition, for illustrative purposes, Workload A and Workload Bare depicted as workloads of the distributed application. In general, adistributed application may have any number of workloads.

In some embodiments, pods 161-163 form a replica set for executingWorkload A and pods 171-172 form a replica set for executing Workload B.Pods of the same replica set are expected to behave similarly on averageand are typically executed in different nodes. The idea behind creatinga replica set with pods distributed across multiple nodes is to splitthe load of the of the workload computation among the multiple nodes.Therefore, in the embodiments where pods 161-163 form a replica set,each of pods 161-163 runs on different nodes of the Kubernetes cluster.Similarly, in the embodiments where pods 171-172 form a replica set,each of pods 171-172 runs on different nodes of the Kubernetes cluster.

One or more pods 151 of activity monitor 150 monitor data trafficbetween the pods, egress data traffic of the pods, and ingress datatraffic of the pods, and collect activity data from the monitored datatraffic, e.g., from packet headers of data packets that are transmittedor received by the pods. The collected activity data includes: portinformation (e.g., port number), protocol information (e.g., TCP),sender information (e.g., workload name for internal Kubernetes trafficor IP address for external non-Kubernetes traffic), and receiverinformation (e.g., workload name for internal, Kubernetes traffic or IPaddress for external non-Kubernetes traffic), of the data traffic intoand out of the pods. Another one of pods 151 of activity monitor 150then transmits the collected activity data to anomaly detection system200, along with metadata that identifies the pod and the workloadassociated with the data traffic and whether the data traffic is egressdata traffic, ingress data traffic, or inter-workload data traffic.

Anomaly detection system 200 includes databases and processes that areexecuted in one or more computing devices to perform anomaly detectionaccording to embodiments. The databases include an activity datadatabase 210, a model database 230, and an alerts database 250. Theprocesses include an anomaly model creator 220, an anomaly tester 240,and an alerts service 260.

Anomaly detection system 200 may be provisioned in a public or privatecloud computing environment, and even in the same data center in whichclient cluster 100 is provisioned. As such, the scope of embodiments isnot limited to a particular computing environment of anomaly detectionsystem 200. In addition, anomaly detection system 200 may be operated bya third party and connected to a plurality of client clusters ofdifferent organizations to whom the third party is providing anomalydetection services.

Activity data database 210 receives activity data from activity monitor150 over a network 180 and stores the activity data in a structuredmanner for use by anomaly model creator 220. Anomaly model creator 220examines the activity data stored by activity data database 210, andcompiles the statistics in Table 1 from the activity data for each podand each workload.

TABLE 1 Egress Traffic Ingress Traffic Inter-Workload Traffic Frequencyof egress traffic Frequency of ingress traffic Message rate in datatraffic to to private IP addresses from private IP addresses particularworkload IDs Frequency of egress traffic Frequency of ingress trafficMessage rate in data traffic to to public IP addresses from public IPaddresses all workloads Egress traffic using specific Message and errorrates (for Error rate in data traffic data domain all ingress traffic)traffic to particular workload IDs Egress traffic using specific Messageand error rates (for Error rate in data traffic to all protocol ingresstraffic from specific workloads IP addresses) Egress traffic usingspecific Combined message rate and protocol: port error rate in datatraffic to particular workload IDs Message and error rates (for Combinedmessage rate and all egress traffic) error rate in data traffic to allworkloads Message and error rates (for egress traffic to specific IPaddresses)

During a time period designated by the administrator of client cluster100 or during any time period where client cluster 100 is expected toexhibit normal behavior, the compiled statistics are stored in astructured manner by model database 230 as a reference model. Theadministrator of client cluster 100 has the option of instructinganomaly detection system 200 to delete a reference model, and designateanother time period for collecting the activity data for creating thereference model. The instruction to delete the reference model may begiven, for example, when the administrator of client cluster 100 detectsmalicious or unusual activity in client cluster 100 (e.g., by applyingtechniques known in the art) during the time period in which activitydata used in creating the reference model is being collected. After thereference model has been created and maintained (not deleted), anomalydetection system 200 performs anomaly detection for client cluster 100by performing the following steps.

First, activity data database 210 stores the activity data received fromactivity monitor 150 over network 180 in a structured manner. Second,anomaly model creator 220 examines the activity data stored by activitydata database 210, and compiles the statistics in Table 1 from theactivity data for each pod and each workload. Third, anomaly tester 240compares the statistics compiled for each workload and each pod againstthe reference model and determines whether there are deviations from thenormal behavior represented by the reference model in any of the pods orany of workloads that need to be flagged as anomalies. Alerts database250 stores the flagged anomalies in a structured manner, and alertsservice 260 issues alerts indicating the anomalies to client cluster 100according to preferences set by the administrator of client system 200.

In the embodiments, anomaly tester 240 flags each of the behaviorslisted in Table 2 of a pod or a workload, as an anomaly:

TABLE 2 Egress Traffic Ingress Traffic Inter-Workload Traffic Egresstraffic is now present; Ingress traffic is now present; Message rate indata traffic to no egress traffic before. no ingress traffic before. aparticular workload ID differs from one in reference model by more thana threshold percentage. Egress traffic to private IP Ingress trafficfrom private IP Message rate in data traffic to addresses not seenbefore. addresses not seen before. all workloads differs from one inreference model by more than a threshold percentage. Egress traffic topublic IP Ingress traffic from public IP Error rate in data traffic dataaddresses not seen before. addresses not seen before. traffic toparticular workload ID differs from one in reference model by more thana threshold percentage. Egress traffic to a particular Message rate forall ingress Error rate in data traffic to all domain not seen before.traffic differs from one in workloads differs from one in referencemodel by more than reference model by more than a threshold percentage.a threshold percentage. Egress traffic using a Error rate for allingress Combined message rate and particular protocol not used trafficdiffers from one in error rate in data traffic to before. referencemodel by more than particular workload ID differs a thresholdpercentage. from one in reference model by more than a thresholdpercentage . Egress traffic using a Message rate for ingress Combinedmessage rate and particular protocol: port not traffic to a specific IPaddress error rate in data traffic to all used before. differs from acorresponding workloads differs from one in one in reference model byreference model by more than more than a threshold a thresholdpercentage. percentage. Message rate for all egress Error rate foringress traffic to traffic differs from one in a specific IP addressdiffers reference model by more than from a corresponding one in athreshold percentage. reference model by more than a thresholdpercentage. Error rate for all egress traffic differs from one inreference model by more than a threshold percentage. Message rate foregress traffic to a specific IP address differs from a corresponding onein reference model by more than a threshold percentage. Error rate foregress traffic to a specific IP address differs from a corresponding onein reference model by more than a threshold percentage.

The threshold percentage for flagging a behavior listed in Table 2 as ananomaly is configurable, e.g., according to a preference of theadministrator of client computer 100 or by anomaly detection system 200.In addition, different threshold percentages may be set for differentbehaviors listed in Table 2.

Over time, the configuration of the distributed application may bemodified so that a different number of pods execute the workloads of thedistributed application. The configuration of the distributedapplication that is modified from that of FIG. 1 is depicted in FIG. 2.In FIG. 2, the number of pods executing Workload A decreased from threeto two and the number of pods executing Workload B increased from two tofour. In such situations, the reference model will need to be updatedbefore it can be used to detect anomalous behavior.

In some embodiments, in addition to or alternative to comparing thestatistics compiled for each pod against the reference model, anomalytester 240 compares the statistics compiled for a pod based on activitydata collected during a current anomaly detection period against thestatistics compiled for other pods in the same replica set as the podbased on activity data collected during the current anomaly detectionperiod. Any behavior listed in Table 3 that is observed in the pod butnot in the other pods is flagged as an anomaly. Alerts database 250stores the flagged anomalies in a structured manner, and alerts service260 issues alerts indicating the anomalies to client cluster 100according to preferences set by the administrator of client system 200.Because the comparisons in these embodiments are made against statisticsthat are compiled based on activity data collected during the sameanomaly detection period, they may be used as the basis for detectinganomalies even after modifications are made to the configuration of thedistributed application.

TABLE 3 Egress Traffic of pod Ingress Traffic Inter-Workload TrafficEgress traffic is present; no Ingress traffic is present; no Messagerate in data traffic to egress traffic from other pods. ingress trafficto other pods. a particular workload ID differs from those of other podsby more than a threshold percentage Egress traffic to private IP Ingresstraffic to private IP Message rate in data traffic to addresses; noegress traffic addresses; no ingress traffic all workloads differs fromfrom other pods to private IP to other pods from private IP those ofother pods by more addresses. addresses. than a threshold percentageEgress traffic to public IP Ingress traffic to public IP Error rate indata traffic data addresses; no egress traffic addresses; no ingresstraffic traffic to particular workload from other pods to public IP toother pods from public IP ID differs from those of other addresses.addresses. pods by more than a threshold percentage Egress traffic to aparticular Message rate for all ingress Error rate in data traffic toall domain; no egress traffic traffic differs from those of workloadsdiffers from those from other pods to the other pods by more than a ofother pods by more than a particular domain. threshold percentage.threshold percentage Egress traffic using a Error rate for all ingressCombined message rate and particular protocol; no egress traffic differsfrom those of error rate in data traffic to traffic from other podsusing other pods by more than a particular workload ID differs theparticular protocol. threshold percentage. from those of other pods bymore than a threshold percentage Egress traffic using a Message rate foringress Combined message rate and particular protocol: port; no trafficto a specific IP address error rate in data traffic to all egresstraffic from other pods differs from those of other workloads differsfrom those using the particular pods by more than a threshold of otherpods by more than a protocol: port. percentage. threshold percentageMessage rate for all egress Error rate for ingress traffic to trafficdiffers from those of a specific IP address differs other pods by morethan a from those of other pods by threshold percentage. more than athreshold percentage. Error rate for all egress traffic differs fromthose of other pods by more than a threshold percentage. Message ratefor egress traffic to a specific IP address differs from those of otherpods by more than a threshold percentage. Error rate for egress trafficto a specific IP address differs from those of other pods by more than athreshold percentage.

The threshold percentage for flagging a behavior listed in Table 3 as ananomaly is configurable, e.g., according to a preference of theadministrator of client computer 100 or by anomaly detection system 200.In addition, different threshold percentages may be set for differentbehaviors listed in Table 3.

FIG. 3 is a flow diagram that illustrates the steps of storing activitydata for compiling statistics for detecting anomalies, according toembodiments. The steps of FIG. 3 is carried out by activity datadatabase 210, and begin at step 310 where activity data database 210receives activity data and associated metadata from client cluster 100.Then, activity data database 210 at step 312 associates the receivedactivity data with a pod ID and a workload ID, that are specified in thereceived metadata, and at step 314 classifies the received activity dataas egress data or ingress data according to the received metadata.

If the received activity data is egress data (step 316, Yes), activitydata database 210 at step 318 records in association with each of thepod ID and the workload ID, whether or not the data is transmitted to apublic network (e.g., to public IP addresses) or to a private network(e.g., to private IP addresses). Then, activity data database 210 atstep 319 records in association with each of the pod ID and the workloadID, the port, protocol, and destination information that are in thecollected activity data.

If the received activity data is ingress data (step 316, No; step 320,Yes), activity data database 210 at step 322 records in association witheach of the pod ID and the workload ID, whether or not the data isreceived from a public network (e.g., from public IP addresses) or froma private network (e.g., from private IP addresses).

If the received activity data is neither egress data nor ingress data(step 316, No; step 320; No), activity data 210 determines that theactivity data is associated with inter-workload traffic and step 324 isexecuted for the inter-workload data. Step 324 is also executed foregress data (after step 319) and for ingress data (after step 322). Themessage rates and error rates that are computed and recorded by activitydata database 210 include all of the message and error rates listed inTable 3. The process ends after step 324.

FIG. 4 is a flow diagram that illustrates the steps of detectinganomalies in a distributed application using the compiled statistics,according to embodiments. The method of FIG. 4 is executed by anomalytester 240 each time new activity data is stored by activity datadatabase 210. Alternatively, the method of FIG. 4 may be executed byanomaly tester 240 according to a periodic schedule or after apredefined amount of new activity data is stored by activity datadatabase 210.

The method of FIG. 4 begins with steps 410, 412, and 414, where anomalytester 240 compiles statistics on egress data, ingress data, andinter-workload data for each pod and each workload based on the activitydata stored by activity data database 210. Then, at step 416, anomalytester 240 compares the statistics compiled for each pod against podsthat are in the same replica set. Any behavior listed in Table 3 that isobserved in the pod but not in the other pods is flagged as an anomaly.If the reference model is still valid for comparison because thedistributed application has not gone through a change in itsconfiguration since the last update of the reference model (step 418,No), anomaly tester 240 compares the statistics compiled for eachworkload and each pod against the reference model. Any behavior listedin Table 2 that is observed in the pod or the workload is flagged as ananomaly. Step 422, which is described below, is executed after step 420.

If the reference model is not valid for comparison because thedistributed application has gone through a change in its configurationsince the last update of the reference model (step 418, Yes), anomalytester 240 skips step 420 and executes step 422. At step 422, anomalytester 240 determines if an anomaly is detected by the comparison madeat step 416 or step 420. If so (step 422, Yes), anomaly tester 240 atstep 424 notifies alerts database 250 of the detected anomalies, inresponse to which alerts database 250 stores the detected anomalies in astructured manner, and alerts service 260 issues alerts indicating theanomalies to client cluster 100 according to preferences set by theadministrator of client system 200. The process terminates after step424 or if no anomaly is detected by the comparison made at step 416 orstep 420 (step 422, No).

If the reference model is not valid for comparison because theconfiguration of the distributed application has been modified since thelast update of the reference model, anomaly model creator 220 updatesthe reference model based on activity data generated after theconfiguration of the distributed application has been modified. Afterthe update, anomaly tester 240 performs both the comparison of step 416and the comparison of step 420.

The embodiments described herein may employ various computer-implementedoperations involving data stored in computer systems. For example, theseoperations may require physical manipulation of physical quantities.Usually, though not necessarily, these quantities may take the form ofelectrical or magnetic signals, where the quantities or representationsof the quantities can be stored, transferred, combined, compared, orotherwise manipulated. Such manipulations are often referred to in termssuch as producing, identifying, determining, or comparing. Anyoperations described herein that form part of one or more embodimentsmay be useful machine operations.

One or more embodiments of the invention also relate to a device or anapparatus for performing these operations. The apparatus may bespecially constructed for required purposes, or the apparatus may be ageneral-purpose computer selectively activated or configured by acomputer program stored in the computer. Various general-purposemachines may be used with computer programs written in accordance withthe teachings herein, or it may be more convenient to construct a morespecialized apparatus to perform the required operations.

The embodiments described herein may be practiced with other computersystem configurations including hand-held devices, microprocessorsystems, microprocessor-based or programmable consumer electronics,minicomputers, mainframe computers, etc.

One or more embodiments of the present invention may be implemented asone or more computer programs or as one or more computer program modulesembodied in computer readable media. The term computer readable mediumrefers to any data storage device that can store data which canthereafter be input to a computer system. Computer readable media may bebased on any existing or subsequently developed technology that embodiescomputer programs in a manner that enables a computer to read theprograms. Examples of computer readable media are hard drives, NASsystems, read-only memory (ROM), RAM, compact disks (CDs), digitalversatile disks (DVDs), magnetic tapes, and other optical andnon-optical data storage devices. A computer readable medium can also bedistributed over a network-coupled computer system so that the computerreadable code is stored and executed in a distributed fashion.

Although one or more embodiments of the present invention have beendescribed in some detail for clarity of understanding, certain changesmay be made within the scope of the claims. Accordingly, the describedembodiments are to be considered as illustrative and not restrictive,and the scope of the claims is not to be limited to details given hereinbut may be modified within the scope and equivalents of the claims. Inthe claims, elements and/or steps do not imply any particular order ofoperation unless explicitly stated in the claims.

Virtualization systems in accordance with the various embodiments may beimplemented as hosted embodiments, non-hosted embodiments, or asembodiments that blur distinctions between the two. Furthermore, variousvirtualization operations may be wholly or partially implemented inhardware. For example, a hardware implementation may employ a look-uptable for modification of storage access requests to secure non-diskdata.

Many variations, additions, and improvements are possible, regardless ofthe degree of virtualization. The virtualization software can thereforeinclude components of a host, console, or guest OS that performvirtualization functions.

Plural instances may be provided for components, operations, orstructures described herein as a single instance. Boundaries betweencomponents, operations, and data stores are somewhat arbitrary, andparticular operations are illustrated in the context of specificillustrative configurations. Other allocations of functionality areenvisioned and may fall within the scope of the invention. In general,structures and functionalities presented as separate components inexemplary configurations may be implemented as a combined structure orcomponent. Similarly, structures and functionalities presented as asingle component may be implemented as separate components. These andother variations, additions, and improvements may fall within the scopeof the appended claims.

1. A method of detecting anomalies in a distributed application thatruns on a plurality of nodes to execute a plurality of workloads,comprising: collecting first network traffic data of the workloads at anumber of different times during a first period of execution of theworkloads; examining metadata of each first network traffic data that iscollected to determine a workload ID of a workload that generated thefirst network traffic data and to determine whether the first networktraffic data is egress data, ingress data, or neither; upon determiningthat the first network traffic data is egress data, recording egresstraffic information contained in the metadata in association with theworkload ID; upon determining that the first network traffic data isingress data, recording ingress traffic information contained in themetadata in association with the workload ID; upon determining that thefirst network traffic data is neither egress data nor ingress data,computing a data rate of the first network traffic data that is neitheregress data nor ingress data and recording the computed data rate inassociation with the workload ID; collecting second network traffic dataof the workloads at a number of different times during a second periodof execution of the workloads after the first period of execution;examining metadata of each second network traffic data that is collectedto determine a workload ID of a workload that generated the secondnetwork traffic data and to determine whether the second network trafficdata is egress data, ingress data, or neither; and detecting one or moreanomalies in the distributed application based on a comparison of egresstraffic information contained in the metadata of the second networktraffic data, ingress traffic information contained in the metadata ofthe second network traffic data, and a data rate of the second networktraffic data that is neither egress data nor ingress data, respectivelyagainst the egress traffic information, the ingress traffic information,and the computed data rate that are recorded in association with theworkload ID corresponding to the workload ID of the workload thatgenerated the second network traffic data.
 2. The method of claim 1,wherein the egress network traffic information includes port, protocol,and domain specified in the egress data.
 3. The method of claim 2,further comprising: computing a data rate of the egress data andrecording the data rate of the egress data in association with theworkload ID of the workload that generated the first network trafficdata, wherein an anomaly in the distributed application is detected alsobased on a comparison of a data rate of the second network traffic datathat is egress data against the data rate of the egress data that isrecorded in association with the workload ID corresponding to theworkload ID of the workload that generated the second network trafficdata.
 4. The method of claim 1, wherein the ingress network trafficindicates whether the ingress data originated from a public network or aprivate network.
 5. The method of claim 4, further comprising: computinga data rate of the ingress data and recording the data rate of theingress data in association with the workload ID of the workload thatgenerated the first network traffic data, wherein an anomaly in thedistributed application is detected also based on a comparison of a datarate of the second network traffic data that is ingress data against thedata rate of the ingress data that is recorded in association with theworkload ID corresponding to the workload ID of the workload thatgenerated the second network traffic data.
 6. (canceled)
 7. (canceled)8-14. (canceled)
 15. A computing system for detecting anomalies in adistributed application that runs on a plurality of nodes to execute aplurality of workloads, said computing system comprising: a storagedevice; and an anomaly detection server, wherein the anomaly detectionserver is configured to: examine metadata of each first network trafficdata that is collected at a number of different times during a firstperiod of execution of the workloads to determine a workload ID of aworkload that generated the first network traffic data and to determinewhether the first network traffic data is egress data, ingress data, orneither; upon determining that the first network traffic data is egressdata, record in the storage device egress traffic information containedin the metadata in association with the workload ID; upon determiningthat the first network traffic data is ingress data, record in thestorage device ingress traffic information contained in the metadata inassociation with the workload ID; upon determining that the firstnetwork traffic data is neither egress data nor ingress data, compute adata rate of the first network traffic data that is neither egress datanor ingress data and record in the storage device the computed data ratein association with the workload ID; examine metadata of each secondnetwork traffic data that is collected at a number of different timesduring a second period of execution of the workloads after the firstperiod of execution to determine a workload ID of a workload thatgenerated the second network traffic data and to determine whether thesecond network traffic data is egress data, ingress data, or neither;and detect one or more anomalies in the distributed application based ona comparison of egress traffic information contained in the metadata ofthe second network traffic data, ingress traffic information containedin the metadata of the second network traffic data, and a data rate ofthe second network traffic data that is neither egress data nor ingressdata, respectively against the egress traffic information, the ingresstraffic information, and the computed data rate that are recorded in thestorage device in association with the workload ID corresponding to theworkload ID of the workload that generated the second network trafficdata.
 16. The computing system of claim 15, wherein the distributedapplication is deployed onto a Kubernetes platform.
 17. (canceled) 18.The computing system of claim 15, wherein the egress network trafficinformation includes port, protocol, and domain specified in the egressdata.
 19. The computing system of claim 15, wherein the ingress networktraffic indicates whether the ingress data originated from a publicnetwork or a private network.
 20. (canceled)
 21. A non-transitorycomputer readable medium comprising instructions that are executable ina computer system, wherein the instructions when executed cause thecomputer system to carry out a method of detecting anomalies in adistributed application that runs on a plurality of nodes to execute aplurality of workloads, said method comprising: examining metadata ofeach first network traffic data that is collected at a number ofdifferent times during a first period of execution of the workloads todetermine a workload ID of a workload that generated the first networktraffic data and to determine whether the first network traffic data isegress data, ingress data, or neither; upon determining that the firstnetwork traffic data is egress data, recording egress trafficinformation contained in the metadata in association with the workloadID; upon determining that the first network traffic data is ingressdata, recording ingress traffic information contained in the metadata inassociation with the workload ID; upon determining that the firstnetwork traffic data is neither egress data nor ingress data, computinga data rate of the first network traffic data that is neither egressdata nor ingress data and recording the computed data rate inassociation with the workload ID; examining metadata of each secondnetwork traffic data that is collected at a number of different timesduring a second period of execution of the workloads after the firstperiod of execution to determine a workload ID of a workload thatgenerated the second network traffic data and to determine whether thesecond network traffic data is egress data, ingress data, or neither;and detecting one or more anomalies in the distributed application basedon a comparison of egress traffic information contained in the metadataof the second network traffic data, ingress traffic informationcontained in the metadata of the second network traffic data, and a datarate of the second network traffic data that is neither egress data noringress data, respectively against the egress traffic information, theingress traffic information, and the computed data rate that arerecorded in association with the workload ID corresponding to theworkload ID of the workload that generated the second network trafficdata.
 22. The non-transitory computer readable medium of claim 21,wherein the egress network traffic information includes port, protocol,and domain specified in the egress data.
 23. The non-transitory computerreadable medium of claim 22, wherein said method further comprises:computing a data rate of the egress data and recording the data rate ofthe egress data in association with the workload ID of the workload thatgenerated the first network traffic data, wherein an anomaly in thedistributed application is detected also based on a comparison of a datarate of the second network traffic data that is egress data against thedata rate of the egress data that is recorded in association with theworkload ID corresponding to the workload ID of the workload thatgenerated the second network traffic data.
 24. The non-transitorycomputer readable medium of claim 21, wherein the ingress networktraffic indicates whether the ingress data originated from a publicnetwork or a private network.
 25. The non-transitory computer readablemedium of claim 24, wherein said method further comprises: computing adata rate of the ingress data and recording the data rate of the ingressdata in association with the workload ID of the workload that generatedthe first network traffic data, wherein an anomaly in the distributedapplication is detected also based on a comparison of a data rate of thesecond network traffic data that is ingress data against the data rateof the ingress data that is recorded in association with the workload IDcorresponding to the workload ID of the workload that generated thesecond network traffic data.
 26. (canceled)
 27. (canceled)
 28. Thecomputing system of claim 18, wherein the anomaly detection server isconfigured to: compute a data rate of the egress data and record in thestorage device the data rate of the egress data in association with theworkload ID of the workload that generated the first network trafficdata, wherein an anomaly in the distributed application is detected alsobased on a comparison of a data rate of the second network traffic datathat is egress data against the data rate of the egress data that isrecorded in the storage device in association with the workload IDcorresponding to the workload ID of the workload that generated thesecond network traffic data.
 29. The computing system of claim 19,wherein the anomaly detection server is configured to: compute a datarate of the ingress data and record in the storage device the data rateof the ingress data in association with the workload ID of the workloadthat generated the first network traffic data, wherein an anomaly in thedistributed application is detected also based on a comparison of a datarate of the second network traffic data that is ingress data against thedata rate of the ingress data that is recorded in the storage device inassociation with the workload ID corresponding to the workload ID of theworkload that generated the second network traffic data.